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I. Introduction: The SASE IV Program 

Series of events occurring randomly in time or space arise in many 
technological and scientific contexts. In statistics these event processes are 
called "stochastic point processes." The events may be the failures of a com- 
puter or the arrivals of nerve impulses at a synapse; the emissions of particles 
by a radioactive source or the errors occurring during transmission of binary 
data. An example is a study (Lewis, 1964) of computer failure patterns in time. 
Detailed statistical analysis (Lewis, 1964) assessed how one event type (mainte- 
nance) affected another (computer failure). It became evident even then that 
computer "reliability" would have to be broadened to include not only physical 
failures but also "congestion" failures. 

Two examples of the analysis of series of events related to the problem of 
computer system performance evaluation are the analyses of a series of page excep- 
tions by Lewis and Shedler (1973) and of transaction times in a data base system 
by Lewis and Shedler (1976). An analysis of page exceptions to two memory levels 
was given by Gaver, Lewis and Shedler (1974) and involved multivariate series of 
events; we discuss these later and concentrate first on univariate series of 
events . 

SASE VI is an extension of the earlier SASE IV described by Lewis, Katcher 
and Weis (1969) and programmed in FORTRAN for IBM Model 360 and IBM Model 370 
systems. SASE IV in turn was an outgrowth of SASE I and SASE II described by 
Lewis (1966). All the programs have been used on other manufacturers systems, 
particularly by neurophysiologists. SASE VI has extended plotting capabilities 
which make the data analytic examination of sets of data much simpler; otherwise 
it is similar to SASE IV. The extensions are discussed in the APPENDIX. 

To understand the type of analysis performed by SASE VI, it is necessary to 
consider series of events in some detail. Such a series is shown schematically 
in Figure 1, events being represented by dots on the time axis. 

In SASE VI, only univariate series of events are considered. The events can 
thus be distinguished only by the times at which they occur; other quantitative 
or qualitative data, such as the type of the event, are ignored for the purpose 
of analysis. Consequently the times-to-events T^, with 0<T^<T2<T^< . . . , or 
equivalently the t imes -between -e vent s , X^, can characterize the process. 

There is, however, another equivalent way in which a series of events may be 

characterized. This is in terms of the counting process N(t), the number of 

events occurring from the onset of measurement to time t. The representations 

of the series of events in terms of the times between events X. and the num- 

1 

ber of events occurring in a given interval N(t) are related by the following 
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Fig. 1. Equivalent representations of series of events by 
the counting process N(t) and the t imes-to-events {T^}. 



fundamental relationships, implicit in Figure 1: 



and 



N(t) < n if and only if T 

n 



I X. 
j=l ' 



> t 



prob{N(t) < n} = prob{T > t} 

n 



( 1 ) 



( 2 ) 



for n = 1 ,2 , . . . . 

These relationships seem almost trivially obvious; yet they have important 
consequences. For example (if we are analyzing a series of events which are 
assumed to be stationary) , a statistical analysis based on estimated correlation 
or spectral properties of the counting process, {N(t)}, is generally not equiv- 
alent to an analysis based on correlation or spectral properties of the interval 
process, {X^}. Thus, if one is to explore a point stochastic process in detail, 
analysis should be made both of the number of events occurring over time and of 
the sequence of intervals separating one event from another. The exact inter- 
relationship between the two types of analysis of stationary processes is implic- 
it in (2) but is difficult to explicate; in certain cases it is clear which is 
most important. In superpositions, for instance, the spectral analysis of the 
interval process is much more informative than the spectral analysis of the 
counting process. 

The SASE VI program, consisting of a main program and ten subroutines, 
carries out the computations for these two correlational analyses as well as 
some more particular types of analysis; for details, see Lewis, Katcher and Weis 
(1969) and the Appendix to this paper. The overall analysis differs from an 
ordinary time-series analysis in two chief respects: (i) the (interval) 

process is a time series in which the variables take on only positive values and 
do not have a Gaussian (normal) distribution, and (ii) the analysis of the 
{N(t)} (counting) process has no direct counterpart in standard time-series 
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analysis. With regard to (i) , a central role is played by the exponential 
distribution. Moreover, SASE IV can be used to analyze sequences of positive 
random variables such as the successive response times in a computer system. 

Assumptions of stationarity of the point process are very iffy when analyzing 
data, for example, from computer systems, and further distinctions between the 
analysis of series of events and ordinary time-series analysis become evident in 
considering trends in a series of events. There are those which are best speci- 
fied in terms of time, while others are most appropriately considered as func- 
tions of the event parameter i. Thus, computer failure-susceptibility might in- 
crease smoothly with time, affecting the intervals between events (failures) 
indirectly and in a complex fashion. On the other hand, if the events are the 
occasions when a rat presses a bar for food in a conditioning experiment, the 
increasing rate of events will be primarily a function of the number of previous 
bar pressings. Again time of day effects, e.g. effect of time of day on number 
of transactions in a data base system, are naturally specified through N(t) and 
are not direct functions of serial number. 

SASE VI includes a capability for detection of several types of trends. Thus, 

the TREND subroutine tests to determine if in a Poisson process the rate changes 

over time. (A Poisson process is an event process in which the number of events 
in non-overlapping time intervals are independent and have Poisson distributions.) 
Other subroutines (see Figure 2) specify further details concerning the nature 
and properties of the trend. Subroutine TREND also calculates quantities which 
can be used to test for trends using least-squares regression methods. These 
methods are flexible and allow various types of trends to be tested, working 
reasonably well under fairly weak assumptions as to the detailed structure of the 
series under study. 

Where TREND does not indicate the presence of trends, stationary processes 
are involved. SASE subroutines (Figure 2) then test for fixed-rate Poisson pro- 
cesses; for correlation between intervals, i.e. for "renewal processes," in which 

the probabilistic structure of events and intervals "starts from scratch" after 
an event occurs. (A fixed-rate (homogeneous) Poisson process is one particular 
instance of a renewal process, but so are others in which the values of the vari- 
able are independent and have the same distributions.) 

SASE VI subroutines also carry out spectral analysis for both the {N(t)} 
and {X^} processes. 

SASE VI has been widely used on a variety of computers with no apparent 
problems. This portability has been achieved at the cost of considerable sloppi- 
ness or looseness of programming. 
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SASE Subroutines 



Several drawbacks in SASE VI include the need to load the whole program 
(approximately 250K bytes of core are needed), and the limitation to 2000 events. 
It is hoped to rectify these problems in a later version of the program. Also, 
several new developments in the statistical analysis of series of events which 
are described in the next section of this paper will be incorporated. 

An APL version of parts of SASE VI has been written, but is not generally 
available. It offers several advantages over the FORTRAN version. I believe 
that with large workspaces and the input-output capabilities of recent versions 
of APL, this is the way to go. The interactive and array handling capabilities 
of APL make it ideal for data analysis. 

II . Recent Developments 

Recent developments in the statistical analysis of point processes have been 
summarized by Lewis (1972); see also Cox (1972) and Brown (1972). There is also 
a recent book on point processes by Snyder (1975) and a sequence of papers by 
Brillinger (1975a, 1975b). Brillinger’s work is based heavily on spectral meth- 
ods for stationary processes and has many points of contact with the work of Cox 
and Lewis. But Snyder (1975) does not reference any of Brillinger’ s work, and 
does not reference Cox and Lewis (1966) . What then is the point of contact 
between these lines of development in the analysis of series of events? 

The work of Cox and Lewis and the work of Brillinger are fairly complementary, 
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the former being highly data analytic in the sense that there is concern with 
validation of assumptions and models and analysis of trends, the latter being 
concerned mainly with spectral methods for stationary (univariate and multivari- 
ate) point processes based on models such as self-exciting point processes 
(Hawkes, 1972; Hawkes and Oakes, 1974) which lend themselves easily to spectral 
methods. I will touch on other differences later; the reader may be interested 
in a new introduction to spectral methods by Bloomfield (1976). 

Snyder (1975) has based his statistical methods almost entirely on likelihood 
analysis and a strong assumption that the stochastic mechanism generating the 
series of events is known and that the sample function density can be written 
down. I have my doubts that this will be useful in analyzing data from computer 
systems, especially since there are no compelling physical models other than, in 
some cases, Poisson and Poisson cluster processes for this type of data. 

Consider now some specific areas of development. 

(1) Trend analysis and detrending of point processes . 

In many fields of application, it has become increasingly apparent that 
stationary point processes are at best a convenient mathematical fiction. Most 
data exhibit fairly subtle trends and methods for testing for these trends are 
known (Cox and Lewis, 1966, Ch. 3); other data, however, exhibit gross trends, 
e.g. time-of“day effects in the series of arrivals at a queue, and techniques for 
the analysis and characterization of such data are only now beginning to be 
developed. 

The situation is analogous to that in ordinary regression analysis and time- 
series analysis where one wants, for example, to estimate parameters in an 
assumed (linear) function for the mean, test the model for the mean function and 
then examine the model which is assumed for the residuals. The latter could, 
include examining the residuals to test for independence, estimating the spectrum 
of the residuals and testing the assumed normality of the residuals. Techniques 
for these problems in the linear normal model are known (see e.g. Hannan, 1970). 

By comparison, in point processes one might want to (i) estimate the rate 
function X(t), using either specific functional models or smoothing techniques; 
(ii) test specific functional models for X(t); (iii) detrend the point process, 
examine the * residual* process and test the usual hypothesis that the events are 
generated by a homogeneous Poisson process. 

When dealing with nonhoraogeneous Poisson processes, the most appropriate de- 
trending technique (Lewis, 1970, 1972) seems to be to transform the time-scale 
so that the i— event occurring at time t^ now occurs at time 
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"l = 



X(u)du, 



where X(u) is some estimate of the rate function X(t). Note that if X(t) is 
known, the ^ Poisson process. When X(t) is estimated from the data 

the distributional problems associated with determining properties of the {t.} 
processes are difficult. 



Results for estimating parametric rate functions and given in Cox (1972) and 
Lewis (1972) and are developed in Lewis and Shedler (1976), where they are 
applied to the statistical analysis of transaction times in data base systems. 

Fairly regular point processes are probably best dealt with by using log 
transforms of the intervals between events and then using ordinary time-series 
methods (Cox and Lewis, 1966, Ch. 3). 

(2) Spectral analysis of point processes . 

By spectral analysis of a point process (Bartlett, 1963) we mean the second- 
order spectrum of the counting function N(t) (the count spectrum). Brillinger 
(1972) has put this spectral analysis on a firm footing in the context of a 
general spectral theory for stationary interval functions such as N(t). He also 
proposed the use of higher-order spectra. 

The spectral analysis may also be thought of as an ordinary second-order 
spectral analysis of a function dN(t) which is a series of delta functions 
occurring at random times (see Lewis, 1970, for a heuristic interpreta- 

tion) . Note that it is not a second-order spectral analysis of the intervals 
between events latter spectrum is useful for differentiating 

between renewal processes (for which it is flat) and non-renewal point processes. 
It may in fact be preferable to higher-order spectra of counts (Brillinger, 1972) 
in that it can be expected to exhibit fewer sampling fluctuations; in general 
my feeling is that a spectral analysis of the counts and the intervals should be 
tried before one goes to higher order spectra. 

Note that a renewal process is completely specified by its second-order count 
spectrum. The spectrum is in fact essentially the Fourier transform of the 
renewal density or the intensity function. 

One drawback to the spectral analysis of point processes is the large amount 

of time required for computation of spectral estimates. Only recently have 

French and Holden (1971), in an important paper, found a way to use the fast 

Fourier transform (FFT) in this problem. To illustrate their technique, consider 

the case when the series is observed for a fixed time t^ and n events occur 

at times T,<T^<...<T , where T < t^. Note that n is the observed value of 
1 2 n n 0 
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the random variable NCt^), 



Empirical spectral analysis of the point process is usually based on the 
finite Fourier-Stielt jes transform of N(t), 

t. 



(w) = (irt^) 



- 1/2 



itu) 

J " ' 



dN(t) 



t=0 

n 



= 1 exp(iT w) 

j=l ^ 



(3) 



(4) 



= (TTtQ) 



- 1/2 



{ ^ cos 

j=l 



n 

(T.u)) + i I sin(T.u))}, 

3 j=l J 



(5) 



and the periodogram 

I (w) = |h (w)|2 . (6) 

0 0 

The sum in (4) is not in the form required for the fast Fourier transform FFT 

algorithm (see, e.g. Cooley, Lewis and Welch, 1970, or Bloomfield, 1976), so that 

for each frequency for which the spectrum is required the number of operations 

required is of the order n. The periodogram is usually computed at frequencies 

u) such that t.u) = 27rp, where p = !,.••,?, since the estimates are "well- 
P w p 

behaved" at these points (Cox and Lewis, 1966, Ch, 5). Since P is of the order 
of n, the total number of computations is of order n^. 



Note that a point process can be sampled completely through knowledge of the 

event times, and the spectrum is not band-limited. However, French and Holden 

(1971) considered the realistic case where an upper frequency of interest co = W 

P 

was given, and considered the continuous time process 

t^ 



Y(t) = 



sin W(t-x) 
2TT(t-T) 



dN(t) 



(7) 



n sin W(t-T.) 

y L 

27T(t-T^) 



( 8 ) 



The Stieltjes convolution of N(t) with the sin(Wt)/ (2Trt) function is 
equivalent to multiplying the spectrum by a frequency window which is flat to 
frequency W and zero thereafter. By sampling Y(t) at intervals of width 
AT = tt/W, no frequency information is lost (aliasing) in the spectrum of Y(t), 
which is the same as that of the point process up to frequency co- The FFT can 
be applied to transform the sampled values of Y(t); the main point in the tech- 
nique, however, is that the sine functions have the same value for all t of the 
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form KAt, where k is an integer . Thus, computation of the Y(fcAt)*s is 
relatively simple and a considerable gain in computing speed shoudl be obtained. 

There are some problems with this technique, e.g. it is not bias-free, as 
asserted by French and Holden (1971), but it appears to be of great value, 

I will return to Brillingers higher-order count spectra when I discuss new 
models. 

(3) Multivariate point processes 

In almost all real cases we are interested in interactions between stochastic 
processes occurring at different places in space and time. Examples are the 
following : 

(i) The spike trains (essentially point processes) occurring at the 
inputs and outputs of a neuron. 

(ii) The successive response times in a data base system and the process 
of transaction initiation times at a certain point in the system. 

(iii) Times of occurrence of page exceptions in a two-level memory multi- 
programmed computer system operating under demand paging (Gaver, 

Lewis, Shedler, 1972). 

(iv) Times of occurrence of earthquakes at different points in California. 

The examples are by no means exclusive but illustrate the main applications. 
We will only discuss multivariate point processes (Cox and Lewis, 1972) which 
might be thought of as a point process in which qualitative marks associated with 
each event can partition the set of events, although we note that Brillinger^s 
work (Brillinger, 1972) encompasses more general situations. The analysis of 
multivariate point processes is discussed by Cox and Lewis (1966, Ch. 8; 1972), 
Perkel, Genstein and Moone (1966) and Brillinger (1972; 1975a, 1975b) and I will 
make only general comments here. 

a. Dependencies between two stationary processes are usually handled via 
spectral methods, i.e. second-order cross-spectra which are then 
normalized to give quantities called coherences. This is Brillinger* s 
approach, but it is not at all clear how useful second-order spectra 
are for point processes, which are of course a long way from normal 
processes in which the second-order spectra completely specify the 
dependence . 

b. There is a problem of specifying what kind of dependency structures 
will occur in multivariate point processes. One can, for instance, 
generate many bivariate Poisson processes, defined to be bivariate 
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point processes in which the individual (marginal) processes are 
Poisson. A start at examining these structures has been given by 
Cox and Lewis (1972). 

c. The above leads into questions of suitable models for multivariate 
point processes and this has been addressed in a tentative way by Cox 
and Lewis (1972) but is still wide open. Mutually exciting point 
processes ( Hawke s , 1972) are being used but are very little under- 
stood and I believe are very untrac table. Some of our recent work 
(Lawrance and Lewis, 1976; Jacobs and Lewis, 1976), when extended to 
multivariate processes, gives promise of much simpler models. 

d. Spectral methods for examining dependencies in point processes are 
only one of many tools for doing this job and I have expressed my 
reservations above. There is a paper by Cox (1972) which is impor- 
tant and offers techniques based on likelihoods which needs further 
explanation. It might, in particular, be useful for analyzing 
dependencies in non-s tat ionary point processes (see e.g. Lewis and 
Shedler, 1976); spectral methods are not applicable for non-station- 
ary point processes. 

e. Finally graphical, data analytic methods, rather than shotgun methods, 
should be used. This is particularly true since it is not completely 
evident that there is always stable stochastic structure in some of 
the processes encountered in examining computer systems. 

An example of a very fruitful, simple graphical analysis of a bivariate point 
process occurred in Gaver, Lewis, and Shedler (1974). There a plot of the inter- 
vals between hits (page exceptions) to a lower-level memory were plotted against 
the number of hits to the higher level memory during that interval. The result- 
ing plot is striking in that it consists (within statistical fluctuations) of two 
separate straight lines through the origin. The main thrust of the paper (Gaver, 
Lewis, Shedler, 1974) is to explain this structure. 

This is really the way data analysis should be used; to suggest models or 
modify postulated models. 

(4) Models 

One aspect of the analysis of computer system data in the series of papers by 
Lewis and Shedler (1973); Gaver, Lewis and Shedler (1974) and Lewis and Shedler 
(1976) which comes home forcibly is that our techniques for analyzing time series 
data is heavily influenced by techniques for analyzing normal time series (c.f. 
Anderson, 1971). Thus one doesn’t concern oneself much with the marginal distri- 
bution of the data, and second order correlation structure is all one looks at 
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unless one is looking at non--linear systems. 

But the marginal distributions in most computer data are highly skewed 
positive random variables and are informative per se. Thus in the data on page 
exceptions in Shedler and Lewis (1973) a spectral analysis gave a clear indica- 
tion of dependence between successive times between page exceptions and a quasi- 
oscillatory phenomenon which was then explainable. But the times between excep- 
tions were very highly skewed and discrete. This gave rise to worry about the 
sampling theory of the spectral estimates, but a closer look at the empirical 
distribution function for the times between page exceptions showed that 5% of 
the times were exactly of length 1024, This was found to be a situation in which 
the program references sequentially each of the (4-byte) words in a (4k-byte) 
page and then goes on in a similar fashion to the next page. 

Besides the fact such information from a data analysis is highly informative 
(and in fact more useful than a postulated model) analysis and modelling tech- 
niques for highly skewed, mixed and dependent random variables are non-existent, 
Lewis and Shedler (1973) used a 2-state univariate semi-Markov generated point 
process (Cox and Lewis, 1966, Ch, 7) to model the data, but estimation of 
parameters was perforce crude. 

This has led us to derive new models which model the marginal distributions 
of the time series {X^} separately from its second-order correlation structure 
(Gaver and Lewis, 1976; Lawrance and Lewis, 1976; Jacobs and Lewis, 1976), We 
feel this is a promising start on a new methodology for analyzing data such as 
that arising in computer system evaluation studies. There are many open prob- 
lems; as an example we have gone from a situation where there were no processes 
of dependent exponential random variables which we have three 

such processes with exactly the same correlation structure of the type encoun- 
tered in mixed moving-average autoregressive processes. Clearly higher-order 
spectra (Brillinger, 1972) will be needed to differentiate between the processes. 

I may add that these new processes are easy to generate on computers and 
thus will lead to data-based models which will be highly useful in simulations 
of computer systems. 
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APPENDIX 

THE SASE VI PROGRAM 



The acronym SASE VI Identifies the Fortran Computer Program for the 
Statistical Analysis of Series of E^vents, the sixth revision. SASE IV was dis- 
tributed as a non-supported program through International Business Machines in 
1969 and its procedures for usage are described by Lewis, Katcher and Weis (1969) 
which we refer to as LKW. The SASE IV program itself was correct and has proved 
to be a very useful tool, especially in the analysis of computer and neuro- 
physiological data. The explanation for its use in LKW was relatively adequate.* 
However, refinements in computation, listings and displays were deemed desirable 
as a result of usage on different data sets. In addition, changes have been made 
to facilitate its use simply as a program for the analysis of positive valued 
random variables. These occur if one does not record exact times of events, but 
counts of events in successive fixed time intervals. 

The actual changes incorporated in SASE VI were programmed by Miss Patricia 
Combs of the Naval Postgraduate School staff, and the program is available from 
the author. 

A. Description of the Program 

The description of SASE IV remains valid except as noted here. The changes 
which are incorporated into SASE VI are listed in order of appearance in the 
program. 

1 . Main Program 

Data may now have any input card format specified by the user on a param- 
eter card described later. Restrictions on the data to be positive and to have 
no more than 1999 points still remain. The main program then tests all data and 
specified parameters for correctness and prints a listing of errors found. If 
errors are detected, the program execution terminates at this point. This partic- 
ular feature may be annoying to the user whose low priority in the service queue 
gives him a long wait to find an error in a minor subroutine parameter, but it is 
felt that none the less this is desirable as rerunning of any subroutine involves 
a complete rerun of subroutine INTER; this is costly, especially with large data 
sets. Typical times to run the complete SASE IV program on an IBM Model 360/65 
with 1999 points are 15 to 20 minutes. Other timing results are given in LKW. 

2. Subroutine INTER 

The subroutine is called automatically with each run of SASE. A new one 
page histogram plot and a smoothed estimated density function plot are automati- 
cally given in addition to the expanded histogram from SASE IV. This subroutine 
may, however, call any or all of three optional subroutines which assist in 
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analysis of the data. The first of the three is SECTION , a subroutine which 
divides the data set into the specified number of equally sized sections. The 
subroutine INTER then performs its complete analysis of each section and the 
statistical results of each section plus the statistical results for total set 
are tabulated for comparison. This option thus gives one an idea of trends or 
non-stationarities evident in the sample distributions from different segments of 
time. 

The second option, subroutine TAIL , sections the complete set of ordered 
data (part of the output of subroutine INTER) into the specified number of tails, 
NT, then eliminates at each iteration, a minimum of fN/NTjof the smallest values. 
The exact number eliminated varies because the subroutine will not cut a string 
of identical valued points; it searches the ordered data until a new unique value 
is located, then the remaining points are processed by subroutine INTER. The 
main value of this feature is the verification of an exponential tail (linear log 
survivor function) in the distribution of the data. This linearity, if present, 
gives some indication of a Poisson or compound Poisson model for the data. 

The third option from subroutine INTER is JACKKNIFE . This feature com- 
putes jackknife statistics [Gray and Shucany, 1972; Miller, 1974] from chunks of 
the data from which successive equisized sections have been deleted, giving a 
total of NJ (a user specified number) sets of statistics. Jackknifing is partic- 
ularly useful for small data sets to obtain variance assessments for estimated 
opoulation parameters. 

3 . Subroutine TREND 

For each value of k, a plot of the values of the mean value and vari- 
t h 

ance of data in the i — set of k consecutive intervals has been added to sub- 
routine TREND. This provides a visual as well as numerical listing of TREND 
indications . 

4. Subroutine RHQ 

A new computation, its listing and its plot have been added to RHO. 
Specifically, it can be used to give the variance time curve for data which con- 
sists of counts of numbers of events occurring in fixed increments of time; 
otherwise it gives a variance-time sequence for the positive times between events 
in a series of events for which exact times of events have been recorded. 

From Cox and Lewis (1966, pg. 72) we know that if the covariance between 

counts in intervals of size t which have J-1, J = 1,2,..., identical 

intervals between them is then the variance of counts in k contiguous 

intervals is , , 

k-1 

V(kx) =kV(x)+2 I (k-i)C.(x) k= 1,2,... . (A.l) 

i=l ^ 
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to obtain 



From this we may use the already computed p^, i = 1,2,... 

C^(t) = V(t)p^ 

and hence V(kT), k = 1,2,... For a sequence of independent variables C^(t) is 
identically zero for i = 1,2,... so that 

V(kT) = kV(T) k = 1,2,... . (A. 2) 

i.e. the variance time curve is linear and passes through the origin. For counts 
this would be the case if we were dealing with a Poisson process. We notice that 
V(t) is the variance of the counting random variable, N(t), in one interval, 
(t,t+x]. The subroutine plots the sequence of values V(kT)/V(T)-k for 
k = 1,2,...,M where M is the minimum of 200 or one half the sample size. For 
an independent sequence of random variables, i.e. p^ = 0, for all i, 

k-1 

V(kT)/V(x)-k = ll (k-l)p. 

i=l ^ 

=0 k = 1,2, . . . . 

Graphically, then, we are looking for deviations from zero in the plot. (The 
’’variance time curve” values V(kT) must be hand plotted using the printed 
values of V(k).) 

5. Subroutine SPEC 

Two changes are incorporated into this subroutine which basically com- 
putes the spectrum of intervals. Of concern to the user only from the stand- 
point of efficiency, is a recursive form for computing sines and cosines. This 
change reduces SPEC running time by approximately 75%. 

The second change is a convenience. The plots of the estimated spectrum 
are given separately for three values of smoothing. Previously, these were super 
imposed and sometimes barely distinguishable. 

6 . Subroutine DENS 

The estimated intensity function, m^(T), sometimes called the renewal 
function, is sensitive to the size of the sampling interval, SDT, specified by 
the user. In general, the optimal size is unknown in advance, so a loop has been 
incorporated into subroutine DENS which gives five separate plots of the esti- 
mated density using the following five interval sizes in estimation: SDT, 2><SDT, 

3xSDT, AxSDT and 5xSDT. This loop also places a new restriction on the size of 
SDT. Where SDT used to be 



now we must have 



0< SDT < t 



(n) 



/5L. 



Having five plots will save rerunning the program several times in an attempt to 
examine an apparent peak in the plot. Recall that m^(x) should be a constant 
value, m, for a Poisson process. 



7. Subroutine BART 

Subroutine BART has been recoded to allow computation of the (Bartlett) 
spectrum of counts by sections. Core requirements are thus reduced by 20,000 
bytes for this subroutine. Additionally, a quadratic smoothing option has been 
added to the subroutine. The program user may select one of the following three 
options in BART: 

(a) linear smoothing and plotting of all points; 

(b) quadratic smoothing and plotting of all points; 

(c) quadratic smoothing and plotting of only the midpoint value of an 
interval of size equal to the smoothing window. 

A final convenience factor is that of automatic selection by default, 
i.e. non-specification, of most of the parameters needed by the subroutines. 

(The parameters listed are the same as those in the LKW manual for SASE IV.) 
Values chosen for this default feature are 



K = 2 

Ml = raax(2, [N/lOO]) 
M2 = 3 Ml 
M3 = 5 Ml 
BELT = x/3 
SDT = x/2 

L = 420} 

B = 2/(N-l) 

P = 3N, 



In addition, the value of the quadratic window in subroutine BART is chosen to 
be 

KQUAD = min {100, [P/5] }, 



if either of the quadratic smoothing options are selected. 

Default options are provided only for convenience and the user should be 
careful to be sure that he knows what he is computing. 
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B. Use of SASE VI 

The use of SASE VI remains essentially the same as the use of SASE IV. In 
order to avoid the necessity of two sources of information, we will list the 
complete control card sequence for calling sase VI and then some observations 
about various combinations of parameters. Details of parameters and computations 
are given in LKW. 



The first control card is a 72 space Hollerith designation of the data set. 



e.g. 



Feb 1972 hourly traffic count, location N3. 



The second 


card lists. 


by means of 


a 


non-zero entry in the 


subroutines to 


be called. 


They are 








Column 


Format 




Subroutine 




1-2 


12 




KDAT 




3-4 


12 




TREND 




5-6 


12 




RHO 




7-8 


12 




SPEC 




9-10 


12 




DURB 




11-12 


12 




VART 




13-14 


12 




COV 




15-16 


12 




DENS 




17-18 


12 




BART 




19-20 


12 




IND 




21-22 


12 




EXPO 




23-24 


12 




// SECTIONS in INTER 




25-26 


12 




// JACKKNIFES in INTER 




27-28 


12 




// TAILS in INTER 


In column 18 the entry should be 0, 1, 


2 


or 3 to obtain the des 


from subroutine 


i BART. 










Entry 






Selection 




0 


BART not 


selected 




1 


BART linear 


smoothing 




2 


BART with 


quadratic smoothing 




3 


BART with 


quadratic smoothing and 






only center points plotted. 


Card three 


is used as 


follows : 








Column 


Format 


Parameter Range 




1-10 


I 10 




N l<n<1999 




11-20 


F 10.0 




T T > 0 
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The first two, N 
may be left blank; 



Column 


Format 


Parameter 


Range 


21-30 


I 


10 


K 


1 < k ^ N/20 


31-40 


I 


10 


Ml 


2 < Ml< 999 


41-50 


I 


10 


M2 


2 < M2 < 999 


51-60 


I 


10 


M3 


2 < M3 < 999 


and T, 


must be 


specified explicitly in all cases; the others 



the program then selects the default values listed above. 



Card four is used as follows : 



Column 


Format 


Parameter 




1-10 


F 


10.0 


DELT 


t /7999 < DELT < t 
n 


11-20 


F 


10.0 


SDT 


0< SDT< t /5L 
n 


21-30 


I 


10 


L 


0 < L < 420 


31-40 


F 


10.0 


B 


0 < B 


41-50 


I 


10 


P 


0 < P < 6000 


51-60 


I 


10 


KQUAD 


0 < KQUAD ^ 200 



Card five is the input data card format statement. Starting in column 1 with 
a left parenthesis, the user lists the format and ends with a right parenthesis, 
for example, 

(6(F10.4,10X)) 

f 

column one 



C. Comments on Parameters 

When the parameter k in subroutine TREND is small, the output from TREND 
will be quite lengthy especially when N is large, since it increases in N/k. 

In subroutine BART, when KQUAD is small, the data window has strong negative 
side lobes which cause the smoothed estimates to be negative in some cases. It 
is recommended that KQUAD be greater than 50 if possible. 
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